AITopics | control batch size

Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence

Neural Information Processing SystemsDec-26-2025, 01:27:35 GMT

Deep neural networks have received dramatic success based on the optimization method of stochastic gradient descent (SGD). However, it is still not clear how to tune hyper-parameters, especially batch size and learning rate, to ensure good generalization. This paper reports both theoretical and empirical evidence of a training strategy that we should control the ratio of batch size to learning rate not too large to achieve a good generalization ability. Specifically, we prove a PAC-Bayes generalization bound for neural networks trained by SGD, which has a positive correlation with the ratio of batch size to learning rate. This correlation builds the theoretical foundation of the training strategy. Furthermore, we conduct a large-scale experiment to verify the correlation and training strategy. We trained 1,600 models based on architectures ResNet-110, and VGG-19 with datasets CIFAR-10 and CIFAR-100 while strictly control unrelated variables. Accuracies on the test sets are collected for the evaluation. Spearman's rank-order correlation coefficients and the corresponding $p$ values on 164 groups of the collected data demonstrate that the correlation is statistically significant, which fully supports the training strategy.

batch size and learning rate, control batch size, training strategy, (7 more...)

Neural Information Processing Systems

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.60)

Add feedback

Reviews: Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence

Neural Information Processing SystemsJan-27-2025, 12:52:29 GMT

Theory-wise, the authors overlooked to discuss several prior works, some of which suggested opposite theories to theirs. For example: - "Don't Decay the Learning Rate, Increase the Batch Size", ICLR'18, seems to support a constant batch size/lr ratio empirically --- after rebuttal --- After reading the comments and the authors rebuttal, I am satisfied with the responses. The paper theoretically verifies that the ratio of batch size to learning rate is positively related to the generalization error. Specifically, it verifies some very recent empirical findings, e.g., Don't decay the learning rate, increase the batch size, ICLR 2018, which empirically states that increasing the batch size and decaying the learning rate are quantitatively equivalent. I think the theoretical result is novel and timely and would interest many readers in the deep learning community.

batch size, batch size and learning rate, theoretical and empirical evidence, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Reviews: Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence

Neural Information Processing SystemsJan-27-2025, 12:52:18 GMT

The paper proves a new upper bound for the generalization ability of algorithms trained by SGD, which demonstrate a negative correlation with the ratio of batch size to learning rate. The authors conducted experiments to verify the theoretical findings on a large number of models. The reviewers have mixed opinions on the paper. On one hand, the paper studies an important problem to the deep learning community, and the theoretical result has its uniqueness (e.g., regarding the ratio of batch size to learning rate), although some discussions on its correlation with previous PAC bounds are missing and some assumptions in the theory need more justifications. On the other hand, the suggestions resulting from the experiments (e.g., always increase the learning rate) seem not very reasonable and need more empirical verifications.

batch size and learning rate, control batch size, theoretical and empirical evidence, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence

Neural Information Processing SystemsOct-11-2024, 02:11:03 GMT

Deep neural networks have received dramatic success based on the optimization method of stochastic gradient descent (SGD). However, it is still not clear how to tune hyper-parameters, especially batch size and learning rate, to ensure good generalization. This paper reports both theoretical and empirical evidence of a training strategy that we should control the ratio of batch size to learning rate not too large to achieve a good generalization ability. Specifically, we prove a PAC-Bayes generalization bound for neural networks trained by SGD, which has a positive correlation with the ratio of batch size to learning rate. This correlation builds the theoretical foundation of the training strategy.

batch size and learning rate, theoretical and empirical evidence, training strategy, (5 more...)

Neural Information Processing Systems

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.63)

Add feedback

Constrained and Composite Optimization via Adaptive Sampling Methods

Xie, Yuchen, Bollapragada, Raghu, Byrd, Richard, Nocedal, Jorge

arXiv.org Machine LearningDec-30-2020

The motivation for this paper stems from the desire to develop an adaptive sampling method for solving constrained optimization problems in which the objective function is stochastic and the constraints are deterministic. The method proposed in this paper is a proximal gradient method that can also be applied to the composite optimization problem min f(x) + h(x), where f is stochastic and h is convex (but not necessarily differentiable). Adaptive sampling methods employ a mechanism for gradually improving the quality of the gradient approximation so as to keep computational cost to a minimum. The mechanism commonly employed in unconstrained optimization is no longer reliable in the constrained or composite optimization settings because it is based on pointwise decisions that cannot correctly predict the quality of the proximal gradient step. The method proposed in this paper measures the result of a complete step to determine if the gradient approximation is accurate enough; otherwise a more accurate gradient is generated and a new step is computed. Convergence results are established both for strongly convex and general convex f. Numerical experiments are presented to illustrate the practical behavior of the method.

inner-product test, iteration, norm test, (14 more...)

arXiv.org Machine Learning

2012.15411

Country:

North America > United States > Colorado > Boulder County > Boulder (0.14)
North America > United States > Illinois > Cook County > Evanston (0.04)
North America > United States > Texas > Travis County > Austin (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence

He, Fengxiang, Liu, Tongliang, Tao, Dacheng

Neural Information Processing SystemsMar-18-2020, 20:47:07 GMT

Deep neural networks have received dramatic success based on the optimization method of stochastic gradient descent (SGD). However, it is still not clear how to tune hyper-parameters, especially batch size and learning rate, to ensure good generalization. This paper reports both theoretical and empirical evidence of a training strategy that we should control the ratio of batch size to learning rate not too large to achieve a good generalization ability. Specifically, we prove a PAC-Bayes generalization bound for neural networks trained by SGD, which has a positive correlation with the ratio of batch size to learning rate. This correlation builds the theoretical foundation of the training strategy. Furthermore, we conduct a large-scale experiment to verify the correlation and training strategy.

batch size and learning rate, theoretical and empirical evidence, training strategy, (5 more...)

Neural Information Processing Systems

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.63)

Add feedback

Filters

Collaborating Authors

control batch size

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence

Reviews: Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence

Reviews: Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence

Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence

Constrained and Composite Optimization via Adaptive Sampling Methods

Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence